Statistical Significance and Normalized Confusion Matrices
نویسنده
چکیده
When assessing map accuracy, confusion matrices are frequently statistically compared using kappa. While kappa allows individual matrix categories to be analyzed with respect to either omission or commission error rates, kappa is not used to compare individual matrix categories with respect to both rates concurrently. When this concurrent comparison is desired, the ma trices are typically normalized and then scrutinized on a cell-by-cell basis by inspection. While no parametric test of significance exists for such a cell-by-cell examination, sampling distributions for these main diagonal entries can be estimated by repeated subsampling of the original sample data (i.e., bootstrapping), allowing inferences to be made about the population. In this research, the procedure for estimating the sampling distribution of normalized cell values is described. Three methods for determining the standard error of normalized cell value sampling distributions are also outlined. Using these sampling distributions and their attendant standard error, the statistical comparison of cell values from two normalized confusion matrices is illustrated. One illustrated method requires a mild parametric assumption, whereas the other is completely nonparametric. Nevertheless, the two distinct bootstrap methods produce nearly identical results. Introduction In remote sensing and geographic modeling, disagreement between nominal maps and reality is frequently tabulated and displayed in a confusion matrix. When multiple classification or modeling methods are used, the resulting confusion matrices are typically compared for significant differences. Because it is one of the few measures which can be tested for significance, Cohens kappa (K) (Cohen, 1960) has been the preferred statistic for this confusion matrix comparison. Recently, however, researchers have been urging caution in the indiscriminate use of K without regard to its proper interpretation (Ma and Redmond, 1995) or its correct formulation under stratified sampling schemes (Stehman, 1996). While K has been traditionally chosen over other alternatives because it is adjusted for agreement due to random chance alone, Foody (1992) has indicated that K is too pessimisticit underestimates the proportion of agreement by overestimating the random chance component of the concordance. For detailed confusion matrix analysis, conditional kappa (K) can also be calculated against row or column marginal totals for every matrix class, allowing the accuracy of individual categories to be quantified. K also allows categories between two confusion matrices to be statisticallv compared with respect to either actual or predicted c1ass"membership (Rosenfield and Fitzpatrick-Lins, 1986). While the K technique facilitates comparison of individual category error Department of Geography, 676 SWKT, Brigham Young University, Provo, UT 84602 ([email protected]; shumwaym@ acdl.byu.edu). PE&RS June 1997 rates with respect to either actual or predicted class membership, it cannot be applied with respect to both predicted and actual categories concurrently. In other words, when using K to discuss class-by-class accuracy, the practitioner must constantly specify whether the context is the predicted or actual class membership rate. Matrix normalization is another well established confusion matrix analysis procedure (Feinberg, 1970). In contrast to kappa-based methods, matrix normalization provides four principle advantages: For any class represented in the normalized matrix, its main diagonal entry provides a single summary measure of the class accuracy with respect to both the predicted and actual marginal totals. Unlike K, there is no need to refer to the actual or predicted dimension. For any class in the normalized matrix, its main diagonal entry takes direct account of both the errors of omission and commission for the class. This incorporation of the off-diagonal cell values is a result of the iterative balancing process which creates the normalized matrix (Congalton et al., 1983). Given that the row and marginal totals of normalized confusion matrices sum to a constant, respective cell values in two confusion matrices can be compared directly by inspection. When comparing normalized matrices, any two cells in the matrices can be compared. In contrast, K is limited to the examination of main diagonal cells only. Statistical significance has been the historical bane of normalized matrix analysis. Normalized cell values have no known parametric sampling distribution; thus, there is no parametric way to determine whether a cell value is significantly different from zero. Furthermore, when contrasting cell values in two normalized matrices, there is no parametric method of determining whether apparent differences are statistically significant-the user is limited to comparison by visual inspection. Bootstrapping is a Monte Carlo method of estimating a statistic's sampling distribution when a parametric estimator is nonexistent. The bootstrapping process randomly resamples the original sample data many times. For each new sample, the statistic of interest is calculated and recorded. The frequency distribution of the statistic produced from the repetitions is then used as an approximation of the statistic's sampling distribution. After its creation, estimates of standard error (a,) and tests of significance can be derived from the frequency distribution using a variety of methods (Efron and Gong, 1983). Unless there is some reason for suspecting that the sampling distribution of a statistic is non-normal, there is little reason to determine its standard error by bootstrapping when Photogrammetric Engineering & Remote Sensing, Vol. 63, No. 6, June 1997, pp. 735-740. 0099-1112/97/6306-735$3.00/0
منابع مشابه
Latent Confusion Analysis by Normalized Gamma Construction
We developed a flexible framework for modeling the annotation and judgment processes of humans, which we called “normalized gamma construction of a confusion matrix.” This framework enabled us to model three properties: (1) the abilities of humans, (2) a confusion matrix with labeling, and (3) the difficulty with which items are correctly annotated. We also provided the concept of “latent confu...
متن کاملMaltEval: an Evaluation and Visualization Tool for Dependency Parsing
This paper presents a freely available evaluation tool for dependency parsing, MaltEval (http://w3.msi.vxu.se/users/jni/malteval). It is flexible and extensible, and provides functionality for both quantitative evaluation and visualization of dependency structure. The quantitative evaluation is compatible with other standard evaluation software for dependency structure which does not produce vi...
متن کاملStatistical Evidence in Products Liability Litigation
§ 30A:1 Overview § 30A:2 Litigation Context of Statistical Issues § 30A:3 Qualification of Expert Witnesses Who Give Testimony on Statistical Issues § 30A:4 Admissibility of Statistical Evidence—Rules 702 and 703 § 30A:5 Significance Probability § 30A:5.1 Definition of Significance Probability (The “p-value”) § 30A:5.2 The Transposition Fallacy § 30A:5.3 Confusion Between Significance Probabili...
متن کاملAnalysis of tactile and visual confusion matrices.
Confusion matrices were compiled for uppercase letters and for braille characters presented to observers in two ways: as raised touch stimuli and as visual stimuli that had been optically filtered of their higher spatial frequencies. These and other existing matrices were subjected to a number of analyses, including the choice model and hierarchical clustering. The strong similarity of the visu...
متن کاملDetermining clinical significance independently from statistical significance? Implications for practice
Background In science, statistics are universally used for making an inference about a population from sample data. The purpose of statistical inference is to determine if a proposed null hypothesis can be rejected, by comparing the probability of an observation to occur under the null hypothesis (p-value) to a chosen alpha level of confidence. The null hypothesis is rejected if the p-value is ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006